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INTRODUCTION 

The Ministry of the Environment has a large air quality monitoring program in place that 
produces large quantities of high quality data on both local and regional scale problems. With 
support from the Ministry, we have examined several multivariate methods that can be used to extract 
useful information regarding the origins of airborne pollutants and their transport in the environment 
from these data sets and to provide these methods in the form of computer programs to the Air 
Resources Branch for their use. The results of the work supported by Grant 433G over the period of 
1989 to 1992 are described in the this report. 


THREE-WAY DATA ANALYSIS 
Three-Mode Factor Analysis 

In the studies conducted during the past three years, we have examined the use of three-mode 
factor analysis for the combined spatial and temporal analysis of environmental data. The concept of 
this approach is that environmental data are obtained at multiple locations at multiple times. 
Typically, eigenvector methods like principal components analysis and factor analysis have examined 
only one aspect of this system. However, a new method, three-mode factor analysis (Kroonenberg, 
1983), offered the possibilities for simultaneously examining both the spatial and temporal covariance 
structure of a data set. We obtained the programs to perform these analyses from Dr. Kroonenberg 
and as part of our previous Ministry grant supported studies, we examined the utility of this method 
using two simulated data sets having known, well defined structures. The initial simulation study 
(Zeng and Hopke, 1989a) concluded that additional developmental work was needed to incorporate 
orthogonal rotations of the axes in order to arrive at a "simple structure" criterion similar to that used 
in ordinary factor analysis (Hopke et al., 1976). This additional work was completed and the results 
of the entire study of three-mode factor analysis were presented by Zeng and Hopke (1990). The 
method was also applied to real world data with limited success (Zeng and Hopke, 1992a). 

In the other studies of three-mode factor analysis, we tried to extend it to a quantitative source 
apportionment model as a three dimensional analog of target transformation factor analysis (TTFA) 
(Hopke, 1988; 1989). However, we found that even with error-free data, it was not possible to 
accurately reproduce the true simulation input values (Zeng, 1990). Thus, we began to consider 


alternative approaches to the analysis of three-way data. 


Direct Trilinear Decomposition 
Another approach to analyzing three way data has been developed in Professor Bruce 


Kowalski’s group at the University of Washington. This method is called the Direct Trilinear 
Decomposition (DTD) (Sanchez and Kowalski, 1990). They developed the DTD method to process 
the data obtained in analytical chemistry with instruments that generate two-dimensional arrays of data 
for multiple samples. Their example was a three-way data set of emission-excitation fluorescence 
spectra for three mixtures of three organic compounds. With their DTD method, the emission and 
excitation spectra of all three compounds were predicted and the relative concentration of the 
compounds in the mixtures were calculated. 


The following equation represents the trilinear model (Sanchez and Kowalski, 1990). 


N 
ie > 2,9 b,® c, + En (1) 


n=} 


Note that X;, and E,, represent three-way matrices, and x,, and ¢,, are their corresponding elements. 
X;, is the observed data matrix and E,, is the model error matrix. Vectors a,, b,, and c, are n-th 
column vectors of matrices A, B, and C, and they represent the spectra or profiles of n-th 
components in the first, second, and third mode, respectively. The vector a, is an J dimensional 


vector, b, is J, and C, is K. Equation (1) can be rewritten in scaler form: 


(2) 


N 
Xin = > GinPinln + Eijk 
n=l 


The trilinear model is also called Parallel Factor Analysis (PARAFAC) or CANDECOMP model in 
the psychology literature (Kroonenberg, 1983; Harshman, 1970, 1976; Carroll and Chang, 1970). 
The procedure for performing a trilinear decomposition based on the model in equation (1) is 
outlined in this subsection. A more detailed description and discussion and the algorithms can be 
found in Sanchez and Kowalski (1990). For the special case in which the three-way data block 
contains only two slices in mode 3 (K=2), a, and b, can be solved using the generalized rank 
annihilation method (Sanchez and Kowalski, 1988). In general, there are more than two slices 


(K>2). In order to use the generalized rank annihilation method, two linear combinations of all 


slices need to be found. These two matrices may be generated using a two-component Tucker-1 


model (Tucker, 1966; Kroonenberg, 1983): 


2 
Xi = > D,® u, + Ey (3) 
q= 


Here the D, are ] x J matrices and the orthogonal vectors {u,} belong to the {c,} space. 


DL = Yo 4X (4) 


A suitable choice for the two {u,} vectors are the principal components of the unfolded X,, in the {c,} 


space. 
On another hand, a slice of the data (e.g., k-th slice) can be expressed as the following 


expression according to the model (equation 1): 


N 
x,=35¢,4,9 b, | 65) 


n=l 


Combining equations 4 and 5: 


K N N K 
D, = > Up >. Ca, ® b, = > (a,® b,) > Crk “Eq À (6) 
k=l n=l n=l k=1 
Defining 
K 
ng = À Sen Mat Cae. (7) 


Then for D, and D;: 


4 
N 
D, = » v,,4,® b, 
Si (8) 
v,,4,® b, 


Now generalized rank annihilation can be applied to these two matrices. If the {b,} vectors are 
linearly independent, and b,- b, = 6,,, the Kroneker delta (n=q = 6,,=1, n#q = 6,,=0), then 


D,b, = Yai 2p (9) 
D,b, = v,a 
Eliminating a, from the equations, the generalized eigenvalue-eigenvector problem is obtained 


(Sanchez and Kowalski, 1986): 


D,b v, = D,b v (10) 


ln Re 2 RAR 


where {b,} vectors are eigenvectors. Equation 10 can be solved by projecting D, and D, to square 
matrices and then applying the QZ algorithm for simultaneous diagonalization (Sanchez and Kowalski, 
1986; 1988). 

After {b,} vectors are found, the {a,} vectors can be evaluated by 


pe, Sales (11) 


Finally, the vectors {c,} can be estimated by least squares from {a,} and {b,}: 


Be Din = > E peut jn 


i=l j= 
(12) 


=) (a;,a aod (b Js b;,) 


C = PQ! 


The key to applying DTD in receptor modeling is to establish the relationship between the 
terms in the model [equation 1] and the physical properties of the airshed. If the three-way data 
matrix is arranged such that mode 1 (i) corresponds to the chemical species, mode 2 (j) to sampling 
time period, and mode 3 (k) to site, the N vectors {a,} can represent the source profiles. Two new 
concepts are introduced so that the model can be fitted into the physical system. "Emission pattern" 
is used to reflect the relative variations of source emission strength changing with time at the 
receptors. Like a source profile, the emission pattern can be considered to be another property of a 
source. Each source has its own emission pattern. With the source profile and emission pattern, a 
source can be characterized more completely and accurately. The vectors {b,} in model then 
correspond to the emission patterns. 

Another concept, the site coefficient, is introduced to refer to the vectors {c,} in 
equation 1. The site coefficients indicate the variation in source strength from site to site. With these 
definitions, a three-way air quality data set could be decomposed into source profiles (a,), emission 
patterns (b,), and site coefficients (c,). However, each of these profiles is a relative profile, and its 
values may not agree with actual physical scales. 

The ultimate goal of receptor modeling is not only to identify sources, but also to apportion 
the sample mass among the sources. In order to estimate the mass contributions, a matrix 
reconstruction procedure is introduced after DTD. The mass contribution of source n at site k in 


period ij ies nx» Can be obtained by combining the emission pattern and the site coefficient. 


2 = b,,c, (13) 


If the source profile y,, is defined as 


Dye = dm» (14) 


the model in scaler form becomes: 


N N 
X;z S Gin D;nCin = > Vin Znjk (15) 


n=l 


The scale of y,, and ae do not necessarily agree with the actual scale of source profiles and mass 
contributions. Similar to TTFA (Hopke, 1988; 1989), a regression procedure can be used to find a 


set of scaling coefficients, s={s,}, such that 


Nga fisi N 
y; _ 
Xijk = > A (s, jp = > Yinenjk (16) 
n= 
where y,, and z,, are the scaled source profiles and contributions. The regression is based on 
N N 
eee > Znjk © > SZ nik (17) 
n= n= 


Figure | illustrates the overall DTD procedure to provide a quantitative source resolution. The 
application of DTD in receptor modeling was examined in methodological studies using simulated 
data sets and a small real data set. Exact reproduction of error-free data was obtained and quite good 
reproduction was found even for realistic levels of error in the data. The results of these studies are 
presented by Zeng and Hopke (1992b). We are currently pursuing the use of this method with the 
APIOS particulate data and those studies will be the subject of a future report. 


POTENTIAL SOURCE CONTRIBUTION FUNCTION 

We have explored another approach to data analysis based on combining meteorology with 
chemical composition data. The method was presented by Malm ef al. (1986) in their analysis of 
particulate composition data from samples obtained at the Grand Canyon. They were able to relate a 
variety of different sources types to probable locations in the western United States. We have applied 


these methods to precipitation chemistry data for 1984 to 1986 from the Dorset, Charleston Lake, and 
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Figure 1. Outline of DTD model. 


Longwoods sites (Zeng and Hopke, 1989b). We have applied potential source contribution function 
(PSCF) analysis to particulate data from Dorset using the trajectories calculated by D. Yap based on 
surface meteorology (Zeng and Hopke, 1992c). We have also examined the PSCF results for the 
trace elemental data for particles and precipitation samples obtained at Dorset from 1986 to 1988 
(Hopke and Gao, 1992). 

There has been concern raised in discussions of our work that there could be problems 
because we have used only surface meteorology based trajectories and only looked backward in time 
for a period of 48 hours. Thus, we asked Dr. M. Olson of the Atmospheric Environment Service 
(AES) to calculate trajectories at multiple heights in the atmosphere (1000, 925, and 850 hPa) using 
the AES three dimensional trajectory code (Olson ef al., 1978). On the basis of these trajectories, we 
have calculated PSCF fields at each of the three pressure heights. In addition we have examined the 
combination of both the two and three heights by calculating the Total Potential Source Contribution 
Function (Cheng ef al., 1992). We have found that the incorporation of the two lower levels (1000 


and 925 hPa) provides a good representation of the known emission sources for SO, while the 
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inclusion of the 850 hPa suggested areas that are too far from Dorset to be significant source regions. 
Other studies of the utility of this method as a function of the distance scale of the transport 
have also been performed. A study of the transport of gaseous and ionic species measured during the 
Southern California Air Quality Study (Gao et al., 1992) was performed to examine the utility of 
PSCF at a sub-regional scale. In this study, the trajectories are resolved on a 5 km x 5 km grid cell 
size and individual sources can be identified through their presence in particular high potential cells. 
The utility of PSCF analysis at various levels of scale length for the atmospheric transport has been 
reviewed by Hopke ef al. (1992) where it appears that the method can provide useful information 
over the range of scale from urban to semi-global assuming that adequate back-trajectory information 
is available. It is anticipated that PSCF can be applied to the identification of source locations of 
other atmospheric species such as hazardous organic compounds. Such work will be pursued during 


the next grant period. 


SUMMARY 

The primary objectives during the past three years were to develop three way data analysis as 
a quantitative tool for receptor modeling. The initial results have been encouraging and the further 
analyses of Ministry data is now in progress. We have also continued our development and study of 
the Potential Source Contribution Function by incorporating multiple height data into the analysis and 
examining the location resolution as a function of transport scale. Again useful results have been 
obtained for Ministry data as well as data from other locations and additional applications of the 


method can be anticipated. 


PERSONNEL 

This project was directed by Philip K. Hopke, Robert A. Plane Professor of Chemistry at 
Clarkson University in Potsdam, N.Y. It has supported the Ph.D. thesis work of Yousheng Zeng 
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